-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix: Ensure local snapshot staging directory exists for restore #2449
Conversation
Resolves a crash when restoring partition snapshots.
@muhamadazmy this is a pretty dumb bug; the fix is just ensuring that the @jackkleeman adding you as an optional reviewer in case you might have a suggestion for improving the e2e test - feel free to ignore! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @pcholakov for fixing this issue. I left couple of questions below.
server/tests/snapshots.rs
Outdated
use restate_types::config::{LogFormat, MetadataStoreClient}; | ||
use restate_types::net::AdvertisedAddress; | ||
use restate_types::protobuf::cluster::node_state::State; | ||
use restate_types::protobuf::cluster::{NodeState, RunMode}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Running cargo clippy --all-targets --workspace -- -D warnings
usually helps with this kind of CI errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I normally do just lint
but point taken - in this case, I did not :-)
@@ -335,6 +335,10 @@ impl StorageOptions { | |||
pub fn data_dir(&self) -> PathBuf { | |||
super::data_dir("db") | |||
} | |||
|
|||
pub fn snapshots_staging_dir(&self) -> PathBuf { | |||
super::data_dir("pp-snapshots") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed that super::data_dir
is feature gated by #[cfg(any(test, feature = "test-util"))]
won't this mean it should not be used outside of tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hadn't noticed that, just copied what we already do in data_dir(..)
just above. Doesn't seem to hurt - but I'm somewhat surprised that this works. And we already use that and node_dir()
from production code elsewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, i am surprised that it's working to!
if !self.staging_dir.exists() { | ||
std::fs::create_dir_all(&self.staging_dir)?; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this make more sense if moved to create_if_configured
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I deliberately put it closer to the point where we need to use it; a couple of reasons:
- we won't create it if a snapshots destination is configured, but the node only ever produces snapshots, and never downloads any
- we will (re-)create it just-in-time, in case someone or something helpfully deletes the empty staging directory created on startup
Discovered a regression in the snapshot restore path which prevents it from working currently.
This also adds a basic snapshot roundtrip smoke test.